This workshop was made to follow the Beginner R workshop given by Jake Leighton and uses the same built-in data sets. The main focus is on how to perform basic statistical tests in R (though not the statistical reasoning/interpretation behind them) and how to plot data with ggplot.
After completing this workshop you should have: 1. A basic understanding of how to put data into statistical functions and extract the results. 2. A basic understanding of the elements of creating a ggplot graphic 3. Resources to find additional statistical tests and custom plots
#if you have not yet done so, install the tidyverse
#this can take a while, it has a lot of dependencies
#install.packages("tidyverse")
#if you have not yet done so, install rstatix
#It should install quickly
#install.packages("rstatix")
#load the required packages for the workshop
library(tidyverse)
library(datasets)
library(rstatix)
#This workshop uses the built-in ChickWeight dataset. It was chosen because it is very analygous to the data from a mouse study.
head(ChickWeight)
## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
The R programming language was developed specifically for statisticians to analyze data and perform statistical tests. Because of this, it’s very easy to perform statistics in R presuming you know what test you would like.
From the ChickWeight dataset, you want to test for a difference in means between the Diet 1 and Diet 2 Chicks at day 21. This would be an unpaired, two-sample t.test.
#First get the diet 1, day 21 wight data and save it to diet1.day21
#keep the chick name
diet1.day21 <- ChickWeight %>% filter(Time==21) %>% filter(Diet==1) %>% pull(weight,name = Chick)
diet1.day21
## 1 2 3 4 5 6 7 9 10 11 12 13 14 17 19 20
## 205 215 202 157 223 157 305 98 124 175 205 96 266 142 157 117
#Then get the diet 2, day 21 wight data and save it to diet2.day21
diet2.day21 <- ChickWeight %>% filter(Time==21) %>% filter(Diet==2) %>% pull(weight,name = Chick)
diet2.day21
## 21 22 23 24 25 26 27 28 29 30
## 331 167 175 74 265 251 192 233 309 150
R stores most statistical tests in a list-based object called an “htest.”
#By default, t.tests performs a two-sided, unpaired t.test with a 95% confidence level.
diet.day21.ttest = t.test(x = diet1.day21, y = diet2.day21)
diet.day21.ttest
##
## Welch Two Sample t-test
##
## data: diet1.day21 and diet2.day21
## t = -1.2857, df = 15.325, p-value = 0.2176
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -98.09263 24.19263
## sample estimates:
## mean of x mean of y
## 177.75 214.70
Because we saved the test object, we can extract several key test values by name
#For example the t statistic
t <- diet.day21.ttest$statistic
t
## t
## -1.285711
#The p.value
p <- diet.day21.ttest$p.value
p
## [1] 0.2176326
#QUESTION: How would you extract the mean of x from the test object?
You can adjust the test defaults by specifying several options.
#For example, the syntax to specify the (one-sided) alternative hypothesis is that the mean of x is less than the mean of y
t.test(x = diet1.day21, y = diet2.day21, alternative = "less")
##
## Welch Two Sample t-test
##
## data: diet1.day21 and diet2.day21
## t = -1.2857, df = 15.325, p-value = 0.1088
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 13.36074
## sample estimates:
## mean of x mean of y
## 177.75 214.70
What if you want to test if the mean weight for diet 1, on day 21 is significantly different than 200? For a one-sided test, you simply give the test only an argument for x.
#by default mu=0, but here we set it to 200, per the problem.
t.test(x = diet1.day21, mu = 200)
##
## One Sample t-test
##
## data: diet1.day21
## t = -1.5161, df = 15, p-value = 0.1503
## alternative hypothesis: true mean is not equal to 200
## 95 percent confidence interval:
## 146.4699 209.0301
## sample estimates:
## mean of x
## 177.75
Notably you can also pipe data into the t.test. This works best if you specify the variables as a formula. This is the same weigth on Diet 1 vs 2 on day 21 as before, but in the pipe.
#filter for day 21
#filter to only keep diets 1 and 2
#formula is y~x, "data=." specifies to use the data coming from the pipe
ChickWeight %>%
filter(Time==21) %>%
filter(Diet %in% c(1,2)) %>%
t.test(weight ~ Diet, data=.)
##
## Welch Two Sample t-test
##
## data: weight by Diet
## t = -1.2857, df = 15.325, p-value = 0.2176
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -98.09263 24.19263
## sample estimates:
## mean in group 1 mean in group 2
## 177.75 214.70
#QUESTION: Using the pipe, how would you test if the mean sample weight on day 0 for Diet 1 is significantly different from 40?
#Hint: The formula for a one sided test is y~1
#Hint: the option for true value of the mean is "mu"
It’s very easy to switch to the non-parametric alternative to the t-test in R because the syntax is almost exactly the same.
# just switch t.test to wilcox.test to perform the Wilcoxon Rank Sum test.
diet.day21.wilcox = wilcox.test(x = diet1.day21, y = diet2.day21)
It is also easy to do a paired test (difference for each subject). This is desirable in cases where you are measuring the same subject over time. For paired tests, it is important to check that the subjects actually match. Say you want to test if, on average, the weight of chicks on diet 1 for day 18 is different than their weight on day 21.
#get the weight on day 18
diet1.day18 <- ChickWeight %>% filter(Time==18) %>% filter(Diet==1) %>% pull(weight,name = Chick)
#check that the chicks match
print("day18")
## [1] "day18"
diet1.day18
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 19 20
## 171 187 187 154 199 160 250 134 100 112 184 185 81 248 123 120 107
print("day21")
## [1] "day21"
diet1.day21
## 1 2 3 4 5 6 7 9 10 11 12 13 14 17 19 20
## 205 215 202 157 223 157 305 98 124 175 205 96 266 142 157 117
#The chicks don't match, day 21 did not measure a chick #8
#find the mice that have measurements for both days
matching_mice = intersect(names(diet1.day18),names(diet1.day21))
#subset to just the matching mice
diet1.day18 <- diet1.day18[matching_mice]
diet1.day21 <- diet1.day21[matching_mice]
#you can give both datasets and perform the test as "paired" to perform a Wilcoxon signed rank test.
#you need to be very careful that the order of the x and y vectors match.
wilcox.test(diet1.day18,diet1.day21, paired = T)
##
## Wilcoxon signed rank test with continuity correction
##
## data: diet1.day18 and diet1.day21
## V = 7.5, p-value = 0.001912
## alternative hypothesis: true location shift is not equal to 0
To perform a correlation test . Let’s perform a
#By default, cor.test performs a pearson correlation
diet1.day.pearson = cor.test(diet1.day18,diet1.day21)
diet1.day.pearson
##
## Pearson's product-moment correlation
##
## data: diet1.day18 and diet1.day21
## t = 14.031, df = 14, p-value = 1.227e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9030743 0.9884872
## sample estimates:
## cor
## 0.9662357
#For a spearman correlation
diet1.day.spearman = cor.test(diet1.day18,diet1.day21, method = "spearman")
diet1.day.spearman
##
## Spearman's rank correlation rho
##
## data: diet1.day18 and diet1.day21
## S = 20.086, p-value = 4.865e-10
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9704621
#it's clear they are pretty strongly associated
plot(diet1.day18, diet1.day21)
If you just want to compute r/rho, it is faster to use cor().
#pearson
cor(diet1.day18,diet1.day21)
## [1] 0.9662357
#spearman
cor(diet1.day18,diet1.day21, method = "spearman")
## [1] 0.9704621
cor() also will do all the pairwise correlations if you give it a matrix. For example, you could test the co-correlation for each set of timepoints across all mice.
#make the data into a matrix chick x time matrix
ChickWeight %>%
spread(key = Time, value = weight) %>%
dplyr::select(-Diet) %>%
mutate(Chick = as.numeric(as.character(Chick))) %>%
arrange(Chick) %>%
column_to_rownames("Chick") %>%
as.matrix() -> chick.mat
#do the pairwise correlation
chick.mat.cor <- cor(chick.mat, use = "pairwise.complete")
#view the correlation matrix
chick.mat.cor
## 0 2 4 6 8 10
## 0 1.00000000 0.2069073 -0.02433406 -0.09325561 -0.1652851 -0.1755985
## 2 0.20690727 1.0000000 0.77083665 0.59451685 0.4556686 0.3688558
## 4 -0.02433406 0.7708367 1.00000000 0.89976804 0.8040703 0.7200326
## 6 -0.09325561 0.5945169 0.89976804 1.00000000 0.9275951 0.8734706
## 8 -0.16528505 0.4556686 0.80407028 0.92759508 1.0000000 0.9740431
## 10 -0.17559850 0.3688558 0.72003264 0.87347056 0.9740431 1.0000000
## 12 -0.21972092 0.3138995 0.65600796 0.81441931 0.9303109 0.9765563
## 14 -0.25103335 0.2048193 0.54763680 0.71893298 0.8741901 0.9403459
## 16 -0.30809136 0.1374907 0.45820389 0.62017870 0.7803073 0.8585937
## 18 -0.32217272 0.1308610 0.43052562 0.57236408 0.7158592 0.7907778
## 20 -0.32023244 0.1418881 0.43382097 0.56915926 0.6749878 0.7395542
## 21 -0.30212149 0.1363581 0.41594989 0.53096861 0.6293486 0.6927191
## 12 14 16 18 20 21
## 0 -0.2197209 -0.2510334 -0.3080914 -0.3221727 -0.3202324 -0.3021215
## 2 0.3138995 0.2048193 0.1374907 0.1308610 0.1418881 0.1363581
## 4 0.6560080 0.5476368 0.4582039 0.4305256 0.4338210 0.4159499
## 6 0.8144193 0.7189330 0.6201787 0.5723641 0.5691593 0.5309686
## 8 0.9303109 0.8741901 0.7803073 0.7158592 0.6749878 0.6293486
## 10 0.9765563 0.9403459 0.8585937 0.7907778 0.7395542 0.6927191
## 12 1.0000000 0.9821277 0.9277031 0.8636812 0.8035865 0.7648055
## 14 0.9821277 1.0000000 0.9740191 0.9260224 0.8751119 0.8401921
## 16 0.9277031 0.9740191 1.0000000 0.9789155 0.9434249 0.9185716
## 18 0.8636812 0.9260224 0.9789155 1.0000000 0.9840960 0.9710124
## 20 0.8035865 0.8751119 0.9434249 0.9840960 1.0000000 0.9945847
## 21 0.7648055 0.8401921 0.9185716 0.9710124 0.9945847 1.0000000
#plot the pearson's r as a heatmap
heatmap(chick.mat.cor,
Rowv = NA,
Colv = NA,
scale="none")
Sometimes you want to do a lot of t-tests and extract the coefficient and p-values. You could do this by writing a for loop, an apply function, or several different ways in the dplyr pipe. The rstatix package is one of the more elegant ways to perform and store multiple statistical test. We loaded it into the library at the begging.
For each diet, let’s compare the weight of each timepoint to time zero.
#The setup is mostly similar to t.test, but we are using the function t_test
#grouping by diet will iterate over each diet group
#the formula weight~Time shows we are testing weight grouped by time
#setting ref.group=0 compared only to time 0
ChickWeight %>%
group_by(Diet) %>%
t_test(weight~Time,ref.group = "0") ->
diet.time0.tests
diet.time0.tests
## # A tibble: 44 × 11
## Diet .y. group1 group2 n1 n2 statistic df p p.adj
## * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 weight 0 2 20 20 -5.96 21.0 6.46e- 6 6.46e- 6
## 2 1 weight 0 4 20 19 -15.5 20.0 1.34e-12 1.47e-11
## 3 1 weight 0 6 20 19 -14.2 18.6 2.15e-11 2.15e-10
## 4 1 weight 0 8 20 19 -12.1 18.2 4 e-10 3.60e- 9
## 5 1 weight 0 10 20 19 -9.98 18.1 8.91e- 9 7.13e- 8
## 6 1 weight 0 12 20 19 -8.97 18.0 4.59e- 8 2.63e- 7
## 7 1 weight 0 14 20 18 -9.30 17.0 4.38e- 8 2.63e- 7
## 8 1 weight 0 16 20 17 -9.45 16.0 5.95e- 8 2.63e- 7
## 9 1 weight 0 18 20 17 -9.84 16.0 3.39e- 8 2.37e- 7
## 10 1 weight 0 20 20 17 -9.59 16.0 4.85e- 8 2.63e- 7
## # ℹ 34 more rows
## # ℹ 1 more variable: p.adj.signif <chr>
ChickWeight %>%
group_by(Diet) %>%
t_test(weight~Time) ->
diet.all.times.tests
diet.all.times.tests
## # A tibble: 264 × 11
## Diet .y. group1 group2 n1 n2 statistic df p p.adj
## * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 weight 0 2 20 20 -5.96 21.0 6.46e- 6 1.94e- 4
## 2 1 weight 0 4 20 19 -15.5 20.0 1.34e-12 8.84e-11
## 3 1 weight 0 6 20 19 -14.2 18.6 2.15e-11 1.4 e- 9
## 4 1 weight 0 8 20 19 -12.1 18.2 4 e-10 2.52e- 8
## 5 1 weight 0 10 20 19 -9.98 18.1 8.91e- 9 5.44e- 7
## 6 1 weight 0 12 20 19 -8.97 18.0 4.59e- 8 2.57e- 6
## 7 1 weight 0 14 20 18 -9.30 17.0 4.38e- 8 2.54e- 6
## 8 1 weight 0 16 20 17 -9.45 16.0 5.95e- 8 3.21e- 6
## 9 1 weight 0 18 20 17 -9.84 16.0 3.39e- 8 2.03e- 6
## 10 1 weight 0 20 20 17 -9.59 16.0 4.85e- 8 2.67e- 6
## # ℹ 254 more rows
## # ℹ 1 more variable: p.adj.signif <chr>
#similarly there are functions cor_test and wilcox_test
#QUESTION: In the pipe, how would you perform a cor test for each timepoint vs eachother (similar to how we used cor)
#Hint: you still need to transform the data into wide format
More examples and documentation can be found here.
From the beginner workshop you should be familiar with basic plotting in R. Base plotting in R is great for quick, basic plots. However, for more complex plots with publication quality aesthetics, the ggplot2 package is the standard in R.
The “gg” in ggplot2 stands for “grammar of graphics” that is to say the package creates a syntax for the different elements of a graphic element. We’ll start this section by going through the basic pieces to making a ggplot graphic.
There are too many aspects of ggplot for one workshop so please read more in the extensive documentation.
The first step is calling the function ggplot and giving it data. We will stick with ChickWeights for this.
#this calls ggplot and tells it the dataframe we are using
#on it's own it will plot the panel the plot is going it, but not any of the data
ggplot(data = ChickWeight)
#you can also specify data for ggplot using the pipe
#this is helpful because you can filter what goes into the plot
ChickWeight %>%
ggplot()
Aesthetics tell ggplot what variables in your data map to which attributes of the plot. It’s often helpful to specify some basic aesthetics globally at the begging of the plot (you can change them for specific elements later). For example let’s try to plot time on the x-axis and weight on the y-axis and color by diet.
#aesthetics are specified within aes()
#to add elements to the plot use the + operator
#now that it knows what x/y are it will display the axes
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet)
#you can also put the aesthetics inside the ggplot function
#ChickWeight %>%
# ggplot(mapping = aes(x = Time, y = weight, colour = Diet))
The geometry are the layers of the plot that determine what kind of plot is being made. Let’s specify that we are plotting a dot plot with the same aesthetic setup as before.
#we add geom_point to plot the points
#this is plotting every data point you give it
#note that since Diet is a factor it picks a discrete color for each diet
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point()
#if you make diet numeric (i.e. continuous) ggplot will choose a continuous palette
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = as.numeric(Diet)) +
geom_point()
You can also plot multiple geometries at a time. For example, let’s also add a line plot.
#we can add lines between all the points with geom_line()
#however, without further specifications it will plot a point between every point in the color aesthetic group
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line()
#We fix this by specifying an aesthetic for geom_line() that it should group the lines by Chick.
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick))
#Note that you can also set non-variable dependent aesthetics
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point(shape = 2, color = "black") +
geom_line(aes(group = Chick))
You can easily change a version this plot to a boxplot by replacing geom_point() with geom_boxplot().
#Some plots require the aesthetics to be factors; since Time is continuous it is ignoring time
#outlier.color = NA, will force it to not plot outliers
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_boxplot(outlier.color = NA)
#Converting Time into a factor the plot makes a lot more sense
ChickWeight %>%
ggplot() +
aes(x = factor(Time), y = weight, color = Diet) +
geom_boxplot(outlier.color = NA)
#QUESTION: Using the pipe, for day 21 only, how would you make a boxplot of weights, split by diet with a point on top for each chicken. Color the points by diet and make them triangles.
#Hint: `position = position_jitter()` is a nice option for points so that they don't plot on top of eachother. Shape 17 is the solid triangle.
Some more geometries:
#stacked barplots
ChickWeight %>%
ggplot() +
aes(x = factor(Time), y = weight, fill = Diet) +
geom_bar(stat="identity")
#add a linear model
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_smooth(method = "lm")
#histogram1
ChickWeight %>%
ggplot() +
aes(x = weight) +
geom_histogram(binwidth = 5)
#density plots
ChickWeight %>%
ggplot() +
aes(x = weight, color = Diet) +
geom_density()
#plot text labels
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet, label = Chick) +
geom_text()
#dot plots for count
ChickWeight %>%
ggplot() +
aes(x = factor(Time), y = Diet) +
geom_count()
#heatmap
ChickWeight %>%
ggplot() +
aes(x = factor(Time), y = Chick, fill = weight) +
geom_tile()
Setting scales allows you to adjust the aesthetic elements. For example, setting the colors, or axis scales. Scales follow the syntax scale_\{aesthetic\}_\{type\}.
For example to adjust colors you have different options on how to scale them
#using scale_color_discrete with built in palette names
#palette.pals() list the palette names
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
scale_color_discrete(palette = "Set2")
#specifying explicit colors with scale_color_manual or scale_color_discrete
#colors() is all the named colors in R, or you cna use hex
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
scale_color_manual(values = c("1"="tomato3","2"="#EEB422",
"3"="springgreen3","4"="steelblue2"))
#scale using a names color palette function like viridis
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
scale_color_viridis_d()
#continuous colors often need a different color function like continuous
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = weight) +
geom_point() +
geom_line(aes(group = Chick)) +
scale_color_continuous(palette = "Set2")
#though gradient also works well
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = weight) +
geom_point() +
geom_line(aes(group = Chick)) +
scale_color_gradientn(colors = c("steelblue","grey","firebrick"))
#for some geom's there's a difference in fill and color
ChickWeight %>%
ggplot() +
aes(x = factor(Time), y = weight, color = Diet) +
geom_boxplot(outlier.color = NA) +
scale_color_manual(values = c("1"="tomato3","2"="#EEB422",
"3"="springgreen3","4"="steelblue2"))
#scaling the fill
ChickWeight %>%
ggplot() +
aes(x = factor(Time), y = weight, fill = Diet) +
geom_boxplot(outlier.color = NA) +
scale_fill_manual(values = c("1"="tomato3","2"="#EEB422",
"3"="springgreen3","4"="steelblue2"))
#you can combine them
ChickWeight %>%
ggplot() +
aes(x = factor(Time), y = weight, fill = Diet, color = Diet) +
geom_boxplot(outlier.color = NA) +
scale_color_manual(values = c("1"="tomato4","2"="goldenrod4",
"3"="springgreen4","4"="steelblue4")) +
scale_fill_manual(values = c("1"="tomato1","2"="goldenrod1",
"3"="springgreen1","4"="steelblue1"))
Scales are also important for changing the axes and setting the limits. For example, you can use them to make the axis longer or truncated, break at different points,logged, or change the order for discrete variables.
#make the y axis extend to 500 and adjust the axis breaks
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
scale_y_continuous(limits = c(0,500), breaks = c(0,250,500))
#order the diet and scale y in log
ChickWeight %>%
ggplot() +
aes(x = Diet, y = weight, color = Diet) +
geom_boxplot() +
scale_x_discrete(limits = c(3,4,1,2)) +
scale_y_log10(limits = c(1,500),breaks = c(1,10,100,500))
Facets let you split a plot into multiple plots by a variable. For example, split the lineplot by diet.
#facet_wrap() allows you to breaks into any number of rows
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet)
#you can also set a different axis for each plot
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet, scales="free_y")
#facet grid is similar, but the variable is either split on x or y
#this forces the above plot to be 1 row
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_grid(~Diet, scales="free_y")
#this forces the above plot to be 1 row
ChickWeight %>%
ggplot() +
aes(x = Chick, y = weight, color = Diet) +
geom_point() +
facet_grid(Time~Diet, scales="free", space="free_x")
This is probably the piece you will use the most infrequently. Coordinates allow you to make the aspect ratio the same, flip the axes, or convert to polar coordinates.
#flip the boxplots
ChickWeight %>%
ggplot() +
aes(x = Diet, y = weight, color = Diet) +
geom_boxplot() +
coord_flip()
#flip the boxplots
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
coord_polar(theta = 'x')
Theme is all the things that make ggplot graphics look really good. There are several built-in themes, but usually you will want to customize on top of that.
#theme_classic is a decent start
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
theme_classic()
#so is minimal
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
theme_minimal()
#some remove too much
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
theme_void()
#some are hideous
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
theme_dark()
Theme is also used to adjust specific elements on the panels, and axes.
#you specify
#remove the panel background and turn the grid black
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet) +
theme(panel.background = element_blank()) +
theme(panel.grid = element_line(color = "black"))
#you specify
#turn the facet boxes dark and make the text larger and white
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet) +
theme(strip.background = element_rect(fill = "grey20")) +
theme(strip.text = element_text(size=12, color = "white"))
#Add a title and overwrite the axis names
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet) +
ggtitle("MyPlot") +
xlab("time since dose") +
ylab("weight in grams")
#Adjust text size/color/directions/alignment
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet) +
ggtitle("MyPlot") +
theme(plot.title = element_text(hjust = 0.5, size=16)) +
theme(axis.title = element_text(size=14, color = "darkgrey")) +
theme(axis.text.x = element_text(size=10, angle=90, hjust = 1, vjust = 0.5))
#draw or adjust axis lines
ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet) +
ggtitle("MyPlot") +
theme(axis.line = element_line(linetype = "dashed", color = "red",size = 0.5))+
theme(axis.ticks.y = element_line(color = "blue", size=3))
Also you can save your plot to a variable and add elements to that later.
p <- ChickWeight %>%
ggplot() +
aes(x = Time, y = weight, color = Diet) +
geom_point() +
geom_line(aes(group = Chick)) +
facet_wrap(~Diet) +
ggtitle("MyPlot")
p
p + theme_minimal()
Documentation on every ggplot function: ggplot documentation
Book by the package author with tons of examples: ggplot book
For more inspiration on making pretty plots: The R graph gallery.
For heatmaps and heatmap-related plots (like oncoprints): ComplexHeatmap package
For a huge list of color palettes: paletteer
Package of plot themes:ggtheme
Combining multiple plots into a single graphic: patchwork and cowplot
Ginormous list of ggplot compatible extension packages: awesome-ggplot